Engineering posts about Deep Learning
Curated summaries and key learnings for engineers working with Deep Learning.
Making User-Sequence Data More Cost-Efficient, Faster, and Easier to Use
This article discusses the redesign of a user-sequence platform aimed at improving the efficiency, speed, and usability of user data for machine learning applications. It addresses the challenges...
How Salesforce Built an AI Security Agent for Autonomous Threat Triage
The article outlines how Salesforce developed the SATA agent, an AI-driven system designed to enhance cybersecurity by autonomously triaging threats across complex environments. It highlights the...
Google Tensor SDK Beta with LiteRT
The Google Tensor ML SDK has transitioned from an Experimental Access Program to Beta, enabling developers to leverage the capabilities of the Google Tensor System-on-Chip (SoC) and its dedicated...
Creating a Multi-Tenant AI Agent Platform Handling 7K+ Sessions Without Cross-Team Interference
The article outlines the development of the Bring Your Own Planner (BYOP), a multi-tenant AI agent platform designed to enhance team autonomy and scalability within Salesforce. It addresses the...
Reel Friends: Building Social Discovery that Scales to Billions
In the Meta Tech Podcast episode featuring Pascal Hartig, the engineering intricacies behind the 'Friend Bubbles' feature of Facebook Reels are explored. The discussion highlights the evolution of...
Enhancing Ad Relevance: Integrating Real-Time Context into Sequential Recommender Models
The article presents a novel approach to enhancing ad relevance by integrating real-time context into sequential recommender models. It highlights the limitations of previous models that relied...
Pushing the Frontier for Data Agents with Genie
The article presents Genie, a sophisticated data agent developed by Databricks, designed to enhance the analysis of both structured and unstructured enterprise data. It highlights the challenges...
Addressing HR's widening capacity gap with AI
The article outlines the pressing challenges faced by HR departments in the wake of increasing demands and limited resources, highlighting the widening capacity gap exacerbated by post-pandemic...
How Superhuman and Databricks built a 200K QPS inference platform together
The article describes the collaboration between Superhuman and Databricks in developing a high-performance inference platform capable of handling over 200,000 queries per second (QPS) with stringent...
Text-Conditional JEPA for Learning Semantically Rich Visual Representations
The article introduces Text-Conditional JEPA (TC-JEPA), a new framework for learning semantically rich visual representations by leveraging image captions to modulate predicted features. This...
What Matters in Practical Learned Image Compression
The article presents a comprehensive study on learned image compression codecs, emphasizing their optimization for the human visual system. It highlights the development of a new codec that...
From Where Things Are to What They’re For: Benchmarking Spatial–Functional Intelligence for Multimodal LLMs
The paper introduces the Spatial-Functional Intelligence Benchmark (SFI-Bench), aimed at evaluating the advanced reasoning capabilities of multimodal large language models (MLLMs). It highlights the...
Normalizing Flows with Iterative Denoising
The article presents advancements in Normalizing Flows (NFs) through the introduction of iterative TARFlow (iTARFlow), a generative model that combines autoregressive generation with iterative...
SpecMD: A Comprehensive Study on Speculative Expert Prefetching
The article presents SpecMD, a standardized framework designed for benchmarking caching strategies in Mixture-of-Experts (MoE) models. It highlights the importance of an expert caching mechanism to...
Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing
The article discusses a novel approach to Key-Value (KV) caching in transformer language models, focusing on reducing memory footprint while maintaining high throughput during autoregressive...
Generative AI for Business: A Complete Strategy and Implementation Guide
The article discusses the transformative potential of generative AI in business, highlighting its ability to create significant economic value across various sectors. It emphasizes the importance of...
LLM Vs AI: A Practical Guide to Differences, Use Cases, and Tools
This article serves as a comprehensive guide to understanding the distinctions between large language models (LLMs) and the broader field of artificial intelligence (AI). It outlines the scope, core...
PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning
The article introduces PORTool, an importance-aware policy optimization algorithm designed for multi-tool-integrated reasoning in large language model (LLM) empowered agents. It addresses the...
Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding
The article discusses advancements in Large Language Model (LLM) inference acceleration through the implementation of block diffusion speculative decoding, specifically the DFlash method, on Google...
How AI-Driven Kubernetes Optimization Reclaimed Millions from 47% Idle Capacity
The article discusses Salesforce's challenges with infrastructure scaling on its Hyperforce platform, particularly regarding over-provisioning and idle capacity in Kubernetes services. It introduces...